Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ai): apply code improvements to AudioToText pipeline #4

Merged
merged 2 commits into from
Jul 15, 2024

Conversation

rickstaa
Copy link
Collaborator

@eliteprox here are some small code improvements I found.

@@ -75,6 +75,9 @@ var (
ErrProfEncoder = fmt.Errorf("unknown VideoProfile encoder for protobufs")
ErrProfName = fmt.Errorf("unknown VideoProfile profile name")

ErrUnsupportedAudioFormat = fmt.Errorf("audio format unsupported")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eliteprox this is a very usefull error which should be implemented but currently is not used. I tried inputting the following unsupported audio OGG filetype:

25eb6f27.zip

But it threw a audio duration calculation error. Can we implement a audio file type unsupported error?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking file extensions isn't super reliable and ffmpeg has some edge cases where it returns 0 duration for supported file types, so we currently just check if the duration is <= 0.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The status code returned from GetCodecInfoBytes (which currently ignored) should indicate something that indicates the file format is unsupported, although I don't know exactly what the code would be. However I am unsure why the API returns both a code and an error; ideally it should be one or the other.

For what it is worth in this specific case, ffmpeg should be able to detect the duration for this file. Ogg is probably just not compiled into in the production LPMS build, which has a slimmed-down version of ffmpeg. Adding the ogg demuxer to install_ffmpeg.sh might be enough for this case.

@rickstaa rickstaa force-pushed the add-speech-to-text-code-improvements branch from 3e88726 to 5104725 Compare July 14, 2024 14:42
This commit applies several code improvements to the AudioToText
codebase.
@rickstaa rickstaa force-pushed the add-speech-to-text-code-improvements branch from 5104725 to cb4360b Compare July 14, 2024 14:48
@eliteprox eliteprox merged commit d40d41b into add-speech-to-text Jul 15, 2024
2 of 9 checks passed
@rickstaa rickstaa deleted the add-speech-to-text-code-improvements branch July 15, 2024 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants